Dissertation Performance Portable Short Vector Transforms
نویسنده
چکیده
ing from Special Machine Features In the context of this thesis all short vector SIMD extensions feature the functionality required in intermediate level building blocks. However, the implementation of such building blocks depends on special features of the target architecture. For instance, a complex reordering operation like a permutation has to be implemented using register-register permutation instructions provided by the target architecture. In addition, restrictions like aligned memory access have to be handled. Thus, a set of intermediate building blocks has to be defined which (i) can be implemented on all current short vector SIMD architectures and (ii) enables all discrete linear transforms to be built on top of these building blocks. This set is called the portable SIMD API. Appendix B describes the relevant parts of the instruction sets provided by current short vector SIMD extensions. Abstracting from Special Compiler Featuresing from Special Compiler Features All compilers featuring a short vector SIMD C language extension provide the required functionality to implement the portable SIMD API. But syntax and semantics differ from platform to platform and from compiler to compiler. These specifics have to be hidden in the portable SIMD API. Table 3.1 (on page 40) shows that for any current short vector SIMD extension compilers with short vector SIMD language extensions exist. 6.1 Definition of the Portable SIMD API The portable SIMD API includes macros of four types: (i) data types, (ii) constant handling, (iii) arithmetic operations, and (iv) extended memory operations. An overview of the provided macros is given below. Appendix C contains examples of actual implementations of the portable SIMD API on various platforms. All examples of such macros displayed in this section suppose a two-way or fourway short vector SIMD extension. The portable SIMD API can be extended to arbitrary vector length ν. Thus, optimization techniques like loop interleaving (Gatlin and Carter [37]) can be implemented on top of the portable SIMD API.
منابع مشابه
An Abstraction Layer for SIMD Extensions
This paper presents an abstraction layer for short vector SIMD ISA extensions like Intel’s SSE, AMD’s 3DNow!, Motorola’s AltiVec, and IBM’s Double Hummer. It provides unified access to short vector instructions via intermediate level building blocks. These primitives are C macros that allow, for instance, portable and highly efficient implementations of discrete linear transforms like FFTs and ...
متن کاملA Portable Short Vector Version of Fftw
This paper presents a portable short vector extension for the popular FFT library Fftw. Fftw is a freely available portable FFT software-library that achieves top performance across a large number of platforms. The newly developed extension enables the utilization of short vector extensions like Intel’s SSE and SSE 2 as well as Motorola’s AltiVec for any problem sizes. The method is independent...
متن کاملOmnidirectionally Balanced Multiwavelets for Vector Wavelet Transforms
Vector wavelet transforms for vector-valued fields can be implemented directly from multiwavelets; however, existing multiwavelets offer surprisingly poor performance for transforms in vector-valued signal-processing applications. In this paper, the reason for this performance failure is identified, and a remedy is proposed. A multiwavelet design criterion, omnidirectional balancing, is introdu...
متن کاملWavelet transforms for vector fields using omnidirectionally balanced multiwavelets
Vector wavelet transforms for vector-valued fields can be implemented directly from multiwavelets; however, existing multiwavelets offer surprisingly poor performance for transforms in vector-valued signalprocessing applications. In this paper, the reason for this performance failure is identified, and a remedy is proposed. A multiwavelet design criterion, omnidirectional balancing, is introduc...
متن کاملClassical Wavelet Transforms over Finite Fields
This article introduces a systematic study for computational aspects of classical wavelet transforms over finite fields using tools from computational harmonic analysis and also theoretical linear algebra. We present a concrete formulation for the Frobenius norm of the classical wavelet transforms over finite fields. It is shown that each vector defined over a finite field can be represented as...
متن کامل